Computing Paraphrasability of Syntactic Variants Using Web Snippets
نویسندگان
چکیده
In a broad range of natural language processing tasks, large-scale knowledge-base of paraphrases is anticipated to improve their performance. The key issue in creating such a resource is to establish a practical method of computing semantic equivalence and syntactic substitutability, i.e., paraphrasability, between given pair of expressions. This paper addresses the issues of computing paraphrasability, focusing on syntactic variants of predicate phrases. Our model estimates paraphrasability based on traditional distributional similarity measures, where the Web snippets are used to overcome the data sparseness problem in handling predicate phrases. Several feature sets are evaluated through empirical experiments.
منابع مشابه
Finding Distinct Answers in Web Snippets
This paper presents ListWebQA, a question answering system aimed specifically at discovering answers to list questions in web snippets. ListWebQA retrieves snippets likely to contain answers by means of a query rewriting strategy, and extracts answers according to their syntactic and semantic similarities afterwards. These similarities are determined by means of a set of surface syntactic patte...
متن کاملMining Web Snippets to Answer List Questions
This paper presents ListWebQA, a question answering system that is aimed specifically at extracting answers to list questions exclusively from web snippets. Answers are identified in web snippets by means of their semantic and syntactic similarities. Initial results show that they are a promising source of answers to list questions.
متن کاملInternational Journal of Soft Computing and Engineering
Semantic similarity measures play an important role in the extraction of semantic relations. Semantic similarity measures are widely used in Natural Language Processing (NLP) and Information Retrieval (IR). The work proposed here uses web based metrics to compute the semantic similarity between words or terms and also compares with the state-of-the-art. For a computer to decide the semantic sim...
متن کاملUsing the Web as a Corpus for the Syntactic-Based Collocation Identification
This paper presents an experiment that uses a Web search engine and a robust parser for the Web-based identification of collocations (statistically significant word associations representing “a conventional way of saying things” (Manning and Schütze, 1999)). We identify the possible collocates of a given word by parsing the text snippets returned by the search engine when querying that word. Th...
متن کاملLeveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery
The Web is replete with tutorial-style content on how to accomplish programming tasks. Unfortunately, even top-ranked tutorials suffer from severe security vulnerabilities, such as cross-site scripting (XSS), and SQL injection (SQLi). Assuming that these tutorials influence real-world software development, we hypothesize that code snippets from popular tutorials can be used to bootstrap vulnera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008